National agencies usually publish disease data (e.g. mortality, incidence, prevalence) for large population subgroups, commonly including
age group and sex
area
Health impact models want these for smaller subgroups, e.g.
smaller areas
socioeconomic indicators (deprivation indices, education etc…)
combinations of all of the above
In a microsimulation model, synthetic individuals are labelled with several characteristics - how finely can we describe their disease risk?
National agencies may publish cross-tabulations of disease outcomes:
by age/sex, by small area, by socioeconomic indicators separately
but not by all of these factors jointly
May also be cohort studies or other literature giving estimates of disease outcomes by particular predictors (e.g. education).
We cannot know for sure things that are not observed, but we may estimate them under clear assumptions
For example: an assumption that some risk factors act independently on the risk of a disease outcome.
Mortality rates by year of age and sex, for whole state.
Standardised mortality rates \(r^{std}_i\) by small areas \(i\)
Expected number of deaths in the area if the age/sex balance of the area were the same as the standard population
Convert to excess rate \(r^{std}_i / r^{std}_{ave}\) relative to average
Average of area-specific rates weighted by area population \(n_i\): \(r^{std}_{ave} = \sum_i n_i r^{std}_i / \sum_i n_i\)
Estimate small-area-specific rate for particular age/sex as
large-area rate by age/sex, multiplied by
excess age/sex standardised rate for small area
General principle: estimate rate by risk factors A x B as
rate by risk factor A, multiplied by
excess rate (standardised relative to A) for risk factor B
Assumption Risk factors act independently
Mortality by age and sex
Mortality by area
Mortality disaggregated by year of age, sex and area
We now have (estimates of) mortality by year of age, sex and small area.
Now we want to disaggregate this further to account for socieconomic variations within the area
Use data on
relative risk of mortality for (high / low education) (for broad age groups), assuming this effect is the same for all small areas.
the proportion of people in each area with different levels of education
Hence infer mortality in each area by year of age, sex, and education
Take the mortality data for a specific group (e.g. year of age, sex, small area)
| Died | Survived |
|---|---|
| 6% | 94% |
| Low education | High education |
|---|---|
| 60% | 40% |
Problem is to fill in the 2 x 2 table
| Died | Survived | Total | |
|---|---|---|---|
| Low education | 60% | ||
| High education | 40% | ||
| Total | 6% | 94% | 100% |
Given knowledge of one cell, we can deduce the other cells
| Died | Survived | Total | |
|---|---|---|---|
| Low education | x | 60%-x | 60% |
| High education | 6%-x | 34%+x | 40% |
| Total | 6% | 94% | 100% |
If we know the relative risk of death between the two education groups, we can deduce x, and fill in the whole table.
Algebraic explanation: we know, for some population:
\(r_{ave}\): average mortality rate
\(p_0, p_1 = 1 - p_0\):, proportions with/without the risk factor
\(RR\): relative mortality with/without the risk factor
and we want
\[ \begin{split} r_{ave} & = p_0 r_0 + p_1 r_1 \\ & = p_0 r_0 + p_1 r_0 RR + \\ r_0 & = r_{ave} / (p_1 RR + p_0) \end{split} \]
e.g. socioeconomic index, with levels \(i = 1, 2, ...\). We know
\(p_i\): proportion of population in level \(i\)
\(RR_i\): relative risk of mortality for level \(i\), compared to level 1
and want to obtain \(r_i\), absolute mortality in level \(i\)
\[ r_{ave} = p_1 r_1 + p_2 r_2 + ... = r_1 \sum_{i=1}^G p_i RR_i \]
gets us \(r_1\) in terms of known quantities.
Then compute \(r_i = r_1 RR_i\)
General principle for disaggregating tabular data on some outcome (e.g. disease mortality, incidence)
Estimate joint effects of multiple risk factors, given effect of each separately
Requires relative rates/risks + population sizes for each risk factor, or standardised rates.
Assumes that different risk factors act independently